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Abstract. We report findings from a web- based experiment on noise anno- 
tation lines, a method to represent attribute uncertainty. We tested and 
compared three design aspects of noise annotation lines and evaluated how 
different design variations influence user performance. We systematically 
varied the number of uncertainty categories, noise width, and noise grain. 
Our results show that the number of uncertainty categories significantly 
influences user performance but that certain design characteristics can 
counterbalance the negative effect of an increased number of categories. 
Additionally, wewereableto show that performance decreases if uncertain- 
ty changes continuously. 
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1. Introduction 

Visualization is a common means to represent uncertainty in spatio- 
temporal data. A variety of methods for this purpose exists, especiallyin the 
area of Gl Science and scientific visualization (Brodlie et al. 2012, Slocum 
2003). Different types of uncertainty are often identified (geometric, attrib- 
ute, temporal, or combinations) and displayed in different ways (e.g., stat- 
ic/dynamic, integrated view/ adjacent view, interactive/ non- interactive). 
Several typologies were created to support the choice of a suitable method 
for different purposes (MacEachren etal. 2005, Senaratne et al . 2012). 



With regards to integrated views a basic distinction between intrinsic and 
extrinsic approaches can be made (Gershon 1998). Intrinsic approaches 
utilize visual variables from existing objects in the visualization to represent 
uncertainty, mostly including visual variables derived in cartography. Be- 
sides the seven visual variables described by Bertin (1983), other variables 
including color saturation, symbol focus, and clarity were added 
(MacEachren 1992, MacGranaghan 1993). Extrinsic approaches, on the 
other hand, incorporate additional graphical objects to represent uncertain- 
ty, e.g. glyphs (Pang 2001) or other objects such as bars or dials that are 
added to the display. Unlike most intrinsic variables, they can be visually 
separated from the content. 

Although a variety of methods for representing uncertainty have been im- 
plemented, most of them have not or have only partially been evaluated 
with regards to aspects of usability. M oreover, most of the studies focus on 
intrinsic methods. 

In this paper we evaluate an extrinsic method we call noise annotation 
lines, a method first described as "procedural annotations" by Cedilnik 
and Rheingans (2000). It is a promising way to display attribute uncertain- 
ty in maps with heterogeneous geometries, e.g. land cover maps, and like 
most uncertainty visualization methods it has barely been evaluated. An 
exception is a qualitative evaluation by Zuk and Carpendale (2006), who 
evaluated procedural annotations from a theoretical standpoint using heu- 
ristics from the theories by Bertin, Tufte, and Ware. They poi nt out that the 
data-ink ratio of a noise grid is relatively small and they hypothesize that 
with other annotation types more uncertainty categories would be possible 
to represent. They do not provide evidence for this but they remark that a 
more formal testing of procedural annotations, especially concerning per- 
ceptual aspects, would be interesting. 

This work contributes to the evaluation of extrinsic uncertainty visualiza- 
tion methods by testing usability aspects of noise annotation lines, focusing 
on the i mpact of design factors on the usabi I ity of the method. 



2. Uncertainty Visualization Using Noise Annotation 
Lines 

Many thematic maps (e.g., land cover maps) contain objects of high geo- 
metric variability, that is, objects that differ considerably in size and shape. 
Representing uncertainty integrated into such maps can lead to cluttered 
displays. Most commonly used uncertainty visualization methods such as 
color saturation or transparency work well with uniform areas but become 



harder to perceive for areas that are geometrically diverse. Extrinsic meth- 
ods, especially those based on uniform grids, seem promising because they 
are i ndependent of the underlyi ng geometry. 




FigureL Noise annotation lines representing dassification uncertainty on avege- 
tation land cover map. The local width of the noise grid indicates the degree of un- 
certainty (larger noise width =higher degree of uncertainty). 

Using noise annotation lines, a regular grid is placed onto the map and is 
distorted locally to represent the degree of uncertai nty. While the authors of 
the original paper proposed four different versions of annotations (varia- 
tion of width, sharpness, noise and amplitude) we decided to implement 
and evaluate the noise grid because we expected noise to be a suitable met- 
aphor for uncertainty. I n a qualitative pre-test people found the noise dis- 
play particularly intuitive. 

Noise annotation lines consist of a noise grid that is varied in size locally to 
represent the level of uncertainty (the more uncertain, the greater the 
width) whereas the number of noise particles and their size ("grain") re- 
mains constant. The grid representation generalizes the original uncertainty 
distribution as only the values covered by the lines are represented. Howev- 
er, si nee the size of the grid cellscan vary accordi ng to thescaleof the map, 
a compromise can be made between maximum coverage of uncertainty in- 
formation and minimum occlusion of the underlying content. This charac- 
teristic makes this method promisingfor usein maps. 



3. Methods 



An experiment was conducted to learn more about the usability of noise 
annotation lines as uncertainty visualization technique. The central aspect 
was the impact of different designs of the noise grid on the effectiveness 
and efficiency of the display. 

3.1. Research questions 

Our goal was to answer the foil owing research questions: 

1 How do different design parameters impact the effectiveness and effi- 
ciency of noise annotation lines as a representation of attribute uncer- 
tai nty i n a themati c map? 

2. H ow does the maxi mum number of categori es affect user performance? 

3. Can users accurately compare the overal I degree of uncertai nty between 
two defined areas when the values vary within the areas ("continuous" 
uncertai nty)? 

3.2. Independent variables 

The appearance of noise annotation lines can be changed by altering differ- 
ent design parameters. Our hypothesis was that these changes may have an 
impact on the effectiveness and efficiency of the uncertainty display. The 
foil owing three major design parameters were chosen as factors: number of 
categories of uncertainty, the width, and the grain of the noise grid (Table 
1). 

The width of the noise grid is defined for the display of maximum uncer- 
tainty (100%) with respect to the size of the grid cells (Figure 2). A lower 
noise width covers a smaller area with noise, however there is also less 
space to represent varying uncertainty values. Consequently, the choice of 
this parameter is a compromise between interference with the underlying 
content and the number of categories that are discernible. If the grid width 
is too large, the gaps between the noise lines become so small that within 
highly uncertain areas you cannot see the grid structure anymore. This al- 
ready occurs with a noise width of 60% of the grid cell size. On the other 
hand, a width of less than 40% does not seem to be suitable to represent 
more than three categories of uncertainty (as revealed by a pre-test). Thus, 
we chose 40% and 50% of the grid cell size as levels of this factor. 

The grain of the noise particles is the second design parameter we are ma- 
nipulating (Figure 3). A finer grid consists of more and smaller particles, 
while a coarser grid contains less particles that are larger. Since we kept a 



constant resolution across all maps in this study (see section 3.4) we im- 
plemented 1x1 pixels and 2x2 pixels for the factor "grain". 

The number of uncertainty categories is varied as the third factor. With 
more categories the difference between two adjacent values decreases from 
33% to 20% uncertainty (Table 2). We chose a minimum of four categories 
because a pre-test revealed that three categories (i.e., 0%, 50%, and 100% 
uncertainty) are straightforward to discriminate in contrast to four catego- 
ries which already lead to errors. Five and six categories seemed to be more 
challenging, so we hypothesized that user performance would decrease in 
comparison to four categories. 



Factor 


Number of levels 


Levels 


Noise width 


2 


Small (40%), Large (50%) 


Noise grain 


2 


Fine (1x1), coarse (2x2) 


Uncertainty categories 


3 


4, 5, 6 categories 



Table L Factors used in the study. 



Figure 2. Design parameter "Noise width". Both grids represent the same degree of 
uncertainty (300%), but with different widths: 40% (left) and 50% (right) of the 
grid cell size. 




Figure 3. Design parameter "Noise grain". Both grids represent the same degree of 
uncertainty (]00%), but with different grain sizes: "fine" (left) and "coarse" (right). 





Step 


Levels 


4 categories 


33% 


0%, 33%, 66%, 1 00% 


5 categories 


25% 


0% 25%, 50%, 75%, 100% 


6 categories 


20% 


0%, 20%, 40%, 60%, 80%, 100% 



Table 2. Categories of uncertainty used in the study. 



3.3. Tasks 

For the main part of the study we chose the foil owing task: A comparison of 
uncertai nty categori es between two defi ned areas i n the map. This approach 
allowed us to compare a total of 150 maps (see below). Still, this is a realis- 
tic task, eg., when analyzing a land cover map: A qualitative, pair-wise 
comparison of different objects from the same class regarding their uncer- 
tainty. The survey question remained the same for all maps: "Which area is 
more uncertain?" Potential answers included "A is more uncertain", "B is 
more uncertain", "A and B are equal" and "I can't tell". As the questions 
were mandatory the latter answer was included to minimize nonsense an- 
swers when parti ci pants could not read the map. 

3.4. Maps 

We created 10 maps per factor combination to establish 10 repetitions. All 
maps were taken from the same vegetation land cover map representing 
equally sized areas (100 mx 100 m). Furthermore, the size of the noise grid 
eel Is i n al I maps was the same. The maps showed two ki nds of uncertai nty: 
Discrete and continuous. "Discrete" maps show a constant uncertainty val- 
ue for each map object whereas "continuous" maps varied in value within 
each area (Figure4). 



Figure 4.Discrete (left) and continuous (right) uncertai nty distribution. 

In addition, we varied the background colors according to a qualitative col- 
or scheme recommended by ColorBrewer (http://colorbrewer.org/). The 
utilized color scheme ("Paired") is indicated to be colorblind-safe and lap- 
top-/ LCD-friendly. I n each map, two square areas in the size of 3 x 3 noise 
grid cells were drawn on areas of the same color and labeled 'A' and 'B' 
(Figure 5). We placed the squares A and B on areas with the same back- 



ground color, either light blue or light green. Those two colors have a very 
similar contrast distance from the white color of the grid so we vary the 
color but not the contrast between grid and background. 




Figure 5. Example map from the study with 40% noise width, coarse grain and 
four uncertai nty categories. Partici pants were asked to compare the degree of un- 
certai nty i n areas A and B. Area B is more uncertai n than A. 

3.5. Survey 

We used a 2x2x3 factorial design and 10 repetitions per factor combination 
for discrete uncertainty visualizations. In order to test the third research 
question (discrete versus continuous), we added maps showing a continu- 
ous uncertainty distribution. For this, we used constant values for noise 
width and grain and just varied the number of uncertainty categories. The 3 
levels shown with 10 repetitions resulted in 30 questions for this section. In 
the survey, the maps with object uncertainty and those with continuous 
uncertainty were randomized. In total, each participant answered 150 map 
questions. 

We conducted the experiment as a web-based survey which made it possi- 
ble to use Amazon Mechanical Turk (seesectionParticipants3.6). While we 



were not able to control the display type, color calibration, or distractions 
that potentially influence the participant, we did not expect relevant differ- 
ences when viewing our maps on different displays. Several studies show 
that lab experiments are comparable to online experiments (Mason & Suri 
2012). 

The study had four parts: 

1) I ntroduction: The participants were provided with an introductory ex- 
planation of uncertainty and noise annotation lines. Three figures of 
noise annotation lines were shown (no, medium and high uncertainty) 
to clarify the method. We also included a note to not use a smartphone 
or similar device and when using a tablet, to not zoom in and out to 
make sure that all subjects see each map as a whole when answering the 
questions. 

2) Personal information: We asked for gender, age and a self- assessment 
in terms of experience with uncertainty visualization in maps. 

3) Map questions: This main section of the study showed the maps includ- 
ing uncertainty. In each map, the two areas (A and B) were compared. 
I n order to avoid bias and learning effects we randomized the order of 
the questions. 

4) Comments: An opportunity to comment on the survey. 

All questions except the comments at the end were mandatory. LimeSurvey 
( http://www.limesurvey.org ) was used to deploy the survey, which is a us- 
er-friendly software freely available under an open source license. We used 
version 192+as problems with the randomization occurred in the latest 
version. 

3.6. Participants 

We recruited participants using the online crowdsourcing service Amazon 
Mechanical Turk (AMT, http://mturk.com). The reasons for utilizing this 
service are threefold: It was easy to recruit the subjects, we aimed to obtain 
participants with different backgrounds and expertise (not only from our 
domain) and third, paid participants were likely to be motivated to finish 
the survey even though it took 20 to 30 minutes. Participants were reim- 
bursed with 50 cents for their participation. 

The majority declared themselves as female (13 out of 22) and only 9 as 
male. Regarding age most of the participants classified themselves to be 
between 20 and 29 years old (9 out of 22), followed by 50 to 59 years 
(7/22). Very young and very old people as well as the group 40 to 49 years 
were barely represented. 



4. Results 



Concerning the subjects' experience with uncertainty maps, we asked three 
questions: If they had known about the concept of uncertainty before, how 
often they use maps, and if they had seen a map including uncertainty in- 
formation before. From these answers we determined a level of experience 
per participant (little, average, extensive experience). Half of the partici- 
pants (11 out of 22) had little experience while one-quarter had average ex- 
perience (5/22) or extensive experience^ 6/ 22) with uncertainty maps. This 
is not surprising as one can expect that participants acquired via Amazon 
Mechanical Turk will be primarily lay people. 

None of the participants produced data that should be considered an outlier 
(Tabachnick & Fi del I 2007). We analyzed both data accuracy and time it 
took participants to answer questions. Given the difficulty of controlling 
ti me (compared to I ab experi ments) it does not come as a surprise that ti me 
did not show any significance and will not be considered further. Accuracy 
is recorded as the percentage of correct answers for a set of maps that be- 
long to a factor combination. We had 10 maps for each combination in our 
3 (levels of uncertainty) x 2 (noise width) x 2 (noise grain) factorial design. 
I n case a participant responded that he or she was not able to provide an 
answer, we treated this situation as a missing value. There were only 68 
missing values out of 3300 (-2%) such that sufficient responses were col- 
lected for each participants and each map. 

Repeated measures AN OVA with the three factors revealed the following: 
M aulchy's test of sphericity shows that for some factors (levels of uncertai n- 
ty) and factor combinations the assumption of sphericity is violated. H ence, 
we are using the Greenhouse-Geisser correction as suggested by Tabachnick 
and Fi del I (2007). Of the three main effects, only the levels of uncertainty 
are statistically significant (F(33.4,16)=6.96, p=.005), while noise and par- 
ticles did not significantly change the perception of uncertainty. However, 
both non-significant main effects show a statistically significant interaction 
effect with the levels of uncertainty. Levels of uncertainty * noise width: 
F(42,2)=3.69, p=.038); levels of uncertainty * grain: F(42,2)=3.94, 
p=.042) . F i gure 6 shows a graph that supports these f i ndi ngs vi sual I y. 



width grain categories 



40% fine 4 
5 
6 

coarse 4 
5 
6 



50% fine 4 
5 



coarse 4 



60% 65% 



70% 75% 80% 
Mean accuracy 



85% 



90% 



Figure 6. Mean accuracy for different factor levels: Accuracy values decrease when 
more categories of uncertainty aredisplayed, esp. with a smaller grid width. The 
combination of larger grid size and coarse grain does not show this behavior: The 
accuracy with 6 categories is roughly as high as with 4 categories (last row). Please 
note that the chart starts at 60% accuracy. 

A second repeated measure ANOVA compared the "discrete versus contin- 
uous" presentation of uncertainty (see section 3.4), that is, whether uncer- 
tainty within an area of interest is changing or not. The second main factor 
is the levels of uncertainty. Maulchy's test of sphericity was not statistically 
significant, however, we adopted the Greenhouse- Geisser correction. The 
main factor of the discrete-vs. -continuous comparison showed a statistically 
significant trend (F(21,l)=4.07, p=.057). The main factor levels of uncer- 
tainty was statistically significant (F(37.14, 1.77)= 13. 34, p<.001). There 
was no significant interaction between these two factors. 



5. Discussion 



The results offer insights into various aspects of uncertainty visualization. 
The first thing to note is that, as expected, the number of categories has an 
influence on people's abilities to make judgments about uncertainty. Simply 
put, the fewer categories participants had to distinguish the better their 
performance was in terms of correct answer. This result can be explained by 
numerous studies in both perception and cognition literature that indicate 
that the more information that has to be distinguished and kept in working 
memory, the more difficult it is to reason with this information (Lloyd & 
Bunch 2005). 

However, andthisisan important finding for research on visualizing uncer- 
tainty, the downward trend of performance can be stopped using appropri- 
ate visualizations. I n the case reported here, the visually more salient visu- 
al i zati on means that compl ement uncertai nty vi sual i zati ons (the two sal i ent 
levels of noise width and noise grain) were able to leverage the negative 
effect of an increased number of uncertainty categories (as revealed by the 
significant interaction effects and the graph in Figure 6). This is especially 
true for the combined effect of high values of noise width and noise grain. 

The comparison of discrete and continuous visualizations of uncertainty 
paint the general picture, that is, the more complex the information is that 
is offered to parti ci pants, the more errors they make. We only compare the 
lowest level of grain and width for both discrete and continuous uncertainty 
visualization. In this combination it is clear that the increase in the number 
of categories leads to worse performance and that on all levels the perfor- 
mance is worse if more than one category of uncertainty is present in the 
areas whi ch parti ci pants compared. 



6. Conclusion 

We have presented a study to evaluate the noise annotation lines for visual- 
izing attribute uncertainty. I n a web- based survey, users had to compare the 
uncertainty values for two equally-sized areas. This was done for discrete 
and continuous uncertainty representations (see section 3.4). 

From the experiment, thefol lowing results could be concluded: 

• the number of uncertainty categories has a significant influence on the 
participants' judgment (with more categories, user performance de- 
creases) 

• the variation of the design parameters noise width and noise grain did 
not significantly change user performance 



• there is a decrease of user performance when more uncertainty catego- 
ries have to be distinguished, but this can be counterbalanced with 
changes i n the desi gn of the noi se gri d 

• more complex uncertainty information ("continuous" uncertainty) leads 
to a significant decrease i n user performance 

• there were no significant effects with respect to response time 

I n this experiment, we did not consider the aspect of intuitiveness. As we 
assume that noise is a good metaphor for uncertainty, it would be interest- 
ing to evaluate if people can intuitively understand and utilize noise annota- 
tion lines in comparison to other extrinsic (e.g., glyphs, sine amplitude) or 
intrinsic approaches (e.g., saturation, opacity or whiteness). Besides, we did 
not evaluate the influence of background colors with different contrast lev- 
els. An interesting aspect would be if and how different colors affect the 
usabi I i ty of the method. 
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